SOFTWARE IMPLEMENTATION OF DISCRETE CONVOLUTION AND 
FOURIER TRANSFORM ALGORITHMS 


by 

SATISH fAVETI 


K 177 i 



DEPARTMENT OF ELECTRICAL ENGINEERING 

MDIAN INSTITUTE OF TECHNOLOGY, KANPUR 


FEBRUARY, 1989 


SOFTWARE IMPLEMENTATION OF DISCRETE CONVOLUTION AND 
FOURIER TRANSFORM ALGORITHMS 


A Thesis Submitted 

in Partial Fulfilment of the Requirements 
for the Degree of 

MASTER OF TECHNOLOGY 


V 


by 

SATIS H KAVETI 


to the 

DEPARTMENT OF ELECTRICAL ENGINEERING 

INDIAN INSTITUTE OF TECHNOLOGY. KANPUR 


FEBRUARY. 1989 



CERTIFICATE 


This is to certify that this thesis titled ” SOFTWARE 
IMPLEMENTATION OF DISCRETE CONVOLUTION AND FOURIER TRANSFORM 
ALGORITHMS ” by Mr. Satish Kaveti has been carried out under my 
supervision and the same has not been submitted else where for a 
degree . 

February, 1989. ( Prof. M.U.Siddiqi ) 

Department of Electrical Engineering 
Indian Institute of Technology 
Kanpur 




CE^^TR^^L 


^ * ,> '-JJ A n \f 

... • I V . ^ : 


I t 


Acc. No, 


t .. 


K ' ' 




0 o 6 li ii 



£E' kpsv~ Tof- 



ACKNOWLEDGEMENTS 


It gives roe iroroense pleasure in expressing roy heartfelt 
thanks and deep sense of gratitute to roy thesis supervisor. 
Prof. M. U. Siddiqi for his valuable guidance and constant 
encourageroent . 

I wish to thank ray friends RaroPrasad, Vemuri, Suneel, 
Banerjee, Sandhu, Verghese and Kushal for helping me at different 
stages of my thesis. 


Satish Kaveti 



ABSTRACT 


In this thesis, we have given software inpl enentat ion of 
various algorithms for discrete convolution and Fourier transform. 
Uinograd short convolution based on the polynomial formulation and 
the matrix formulation have been implemented. Agarwal-Cool ey 
algorithm is employed for convolving large sequences by breaking a 
long convolution into short convolutions. Short convolutions are 
computed by the matrix based formulation of Uinograd algorithm. A 
better algorithm for multidimensional convolution is proposed. 

The DFT algorithms implemented include the mixed-radix 
Cooley-Tukey algorithm, radix-2 fast Fourier transform and the 
Good-Thomas algorithm. Rader’s algorithm which computes the 
Fourier transform of sequences of length equal to a power of an 
odd prime by converting it into convolutions is also implemented. 
The algorithm incorporates a scheme through which the DFT of a 
real sequence can be computed by real multiplications only, 
thereby achieving a reduction in computational complexity. 
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CHAPTER ONE 


INTRODUCTION 

Computation of discrete Fourier transform has been gaining 

importance over the last few years. This is a direct consequence 

of the major role played by digital filtering and DFTs in digital 

signal processing and by the increasing use of digital processing 

techniques made possible by the declining cost of hardware. The 

motivation for developing fast algorithms is strongly rooted in 

the fact that the direct computation of length N convolutions and 

2 

DFTs requires a number of operations proportional to N which 
becomes rapidly excessive for long dimensions. This, in turn, 
implies an excessively large requirement for computer 
implementation of the methods. 

1.1 Historical Background 

Historically, the most important event in the fast algorithm 
has been the fast Fourier transf orm(FFT) , introduced by Cooley and 
Tukey in 1965[1], which computes DFTs with a number of operations 
proportional to N'logN and therefore reduces drastically the 
computational complexity for large N. Algorithms were available 
earlier for fast computation of DFT for sequences of length equal 
to a power of 2, but the greatest emphasis, however, was on the 
computational economy that could be derived from the symmetries of 
the sine and cosine functions. The fast Fourier transform 

algorithm of Cooley and Tukey is more general in that it is 
applicable when N is a composite and not necessarily a power of 2. 

Another important contribution to computing the DFT was 
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nade by Thomas in 1963[23. Like the transform alfiorithm of Cooley 
and Tukey it also achieved its economy by performinfi one - 
dimensional Fourier analysis by doing multidimensional Fourier 
analysis. However, the algorithms are different in the following 
ways : 

1. In the Thomas algorithm the factors of N must be 
mutually prime. 

2. In the Thomas algorithm the calculation is precisely 
multidimensional Fourier analysis with no intervening 
phase shifts or "twiddle factors” as they have been 
called. 

3. The correspondences between the one - dimensional index 
and the multidimensional indexes are quite different. 

It was shown by Good[3] that if N is composite, with mutually 
prime factors, i.e., N = ni'n 2 ' .... nn,. one could do a one 
dimensional Fourier analysis of N points by doing a - dimensional 
Fourier analysis on an m - dimensional, rj x r 2 x .... x rg, array 
of points. Uith these ideas put together and developed, Good’s 
paper contains a full generalization of the Thomas prime factor 
algorithm. 

The algorithm proposed by Cooley and Tukey was quite 
general, but later authors [4] have directed their attention to 
the special case of N = z" than for the general case, and the 
restricted choice of N is adequate for the majority of 
applications. Gentleman and Sande [5] have extended the 
development of the general case and described the possible 
variations in organizing the algorithm. Further, Singleton [6] 


presented an al£orithra for computing the fast Fourier transform, 
based on the Cooley - Tukey aleorithm for computing a transform 
step corresponding to an odd factor of N. The algorithm included 
an efficient method for permuting the results in place. 

All the algorithms for the FFT require that the number of 
data samples, N, be highly composite to gain in terras of 
computation effort. Rader[7] shoved that the DFT of N points, 
where N is a prime is essentially a cyclic convolution of length 
of (N-1). This allowed the DFT to be computed by means of a FFT 
algorithm, with the associated increase in speed, even though N is 
prime. 

Convolution can be computed by DFT because of its cyclic 
convolution property. Therefore the FFT can be used to compute 
convolutions and thus the number of operations becomes 
proportional to N-logN. But the Fourier transform approach has an 
inherent disadvantage, due to the use of complex numbers even when 
the given sequences were real and the inability to give exact 
answers because of use of the irrational functions, namely sine 
and cosine. 

For the computation of convolution the most significant 
contribution was that by Uinograd in 1976. The algorithm proposed 
by lJinograd[8,9] achieved a theoretical reduction of computational 
complexity over the FFT by a method which can be viewed as a 
converse of FFT, because it computes DFT as a convolution. 

Another important factor in the development of new 
algorithms was the recognition that convolutions and DFTs can be 
viewed as operations over in finite rings and fields of integers 
and polynomials. This new point of view has allowed both 




derivation of some lover computational bounds and desifin of new 
and improved computational techniques such as those based on 
polynomial transforms[10] and number theoretic transforms[ 11 ] . 


The Uinograd 

L 

algorithm 

while being 

efficient 

for 

smal 1 

■■ 

lengths of the data 

sequences 

becomes very 

cumbersome 

for 

large 


values of N. Aearwal & Cooley[12] fiave an algorithm for converting 
a problem of convolving long sequences into convolution of shorter 
sub - sequences based on the Chinese Remainder Theorem, 
i 2. Present work 

The present work is concerned with the development and 
implementation of algorithms for discrete convolution and discrete 
Fourier transform. The treatment can be broadly divided into 
three sections : 

1. Number theory and polynomial algebra. 

2. Fast convolution algorithms. 

3. Fast Fourier transform algorithms. 

The first section is concerned with the development of 
algorithms for the mathematical tools which are used often in 
computation of discrete convolutions and discrete Fourier 
transforms. 

Euclidean algorithm for finding the greatest common divisor 
and inverse of two integers have been implemented. Chinese 
Remainder theorem for integers is discussed along with its 
algorithm. An algorithm for finding the primitive root over GFCpD 
is proposed. Various operations on polynomials which are 
implemented include the basic operations like addition, 
multiplication, division and modulo on polynomials over real and 


rational fields and integer ring. A better algorithm for division 
of polynomials over rational field which is less susceptible to 
overflow is proposed. The Euclidean algorithm is extended to 
polynomials over rational field and an algorithm for finding the 
inverse of a polynomial is implemented. The Chinese Remainder 
theorem for fast multiplication of polynomials when the modulo 
polynomial is monic is implemented. An algorithm for factorizing 
polynomials of the form is also implemented. 

The second section is concerned with algorithms for 
computation of discrete convolutions. Uinograd algorithm for short 
convolutions is implemented. The polynomial and matrix based 
formulation of the Uinograd algorithm have been dealt with. The 
matrix based formulation is used for development of algorithms for 
multidimensional convolution. A memory efficient and a faster 
algorithm for convolving large sequences using algorithms for 
short convolutions is implemented based on the Agarwal-Cooley 
algorithm. The convolution of two-dimensional sequences is done by 
nested convolution algorithm. 

The third section deals with fast algorithms for discrete 
Fourier transform. The treatment starts with the mixed-radix 
Cooley Tukey algorithm and radix-2 Fast Fourier Transform along 
with it’s algorithmic implementation. The Good Thomas algorithm 
for transforming a one-dimensional Fourier transform is 
implemented. The procedure includes an efficient algorithm for 
factorization of length N into relatively prime factors nj[, 
i=l,2,... ,k where n^ can be expressed as ni = Pi and each pi’s 
are distinct. The Good Thomas algorithm utilizes the Cooley-Tukey 
FFT algorithm for ni - point DFT. A computationally efficient 


algorithm for DFT of r«al sequences with length equal to a power 

of an odd prime is implemented. The algorithm is based on Rader's 

algorithm for converting a DFT into a convolution and uses the 
Uinograd algorithm for convolving the resulting sequences. The 
algorithm requires re-indexing if the input and output sequences 

and utilizes the symmetry property of the sine and cosine 

functions to minimize the computational effort. 

The algorithms have been implemented on a IBH compatible 
personal computer. The language used for their implementation is 
Turbo Pascal Version 4.0. The appendix lists the various procedures 
available with brief comment on their usage. 
i.3 Structure of the thesis 

Chapter 2 gives a brief account of various mathematical 
concepts required for understanding the algorithms presented in 
this thesis. Algorithms for the frequently used ideas of number 
theory and polynomial algebra are also given. 

Chapter 3 is devoted to the development of algorithms for 
discrete convolution. 

Chapter 4 discusses various fast algorithms for the 
computation of discrete Fourier transform. 

Finally, the last chapter concludes the thesis and discusses 
about the scope for future work. 

Appendix A gives a list of all the Pascal procedures and 
functions implemented along with a brief comment on their usage. 



CHAPTER TUO 


ALGEBRAIC BACKGROUND 

This chapter introduces the necessary backfiround required to 
understand various convolution and DFT algorithns in a simple, 
eeneral way, with the intent of familiarization with the 
mathematical principles that are most frequently used in this 
thesis. After every section an alfiorithmic implementation of the 
methods discussed in that section is presented. 

The chapter is broadly divided into two sections : 

1. Elementary number theory 

2. Polynomial algebra 

In the elementary number theory the following topics have 
been discussed : 

1. Euclidean algorithm 

2. Chinese Remainder theorem for integers 

3. Primitive roots of finite fields 

In the polynomial algebra section the Euclidean algorithm 
and the Chinese Remainder Theorem are extended to polynomials. The 
relationship between polynomial algebra and convolution is 
introduced. A discussion on cyclotomic polynomials and the scheme 
for their generation follows. 

2.1 Elementary Number Theory 

2.1.1 Euclidean Algorithm 

The greatest common divisor (gcd) can be found easily by a 
division algorithm known as the Euclidean algorithmXlS] . 



( 2 . 1 ) 


Also, if a and b are relatively prime, i.e., if 

GCD(a,b) = (a.b) = 1 
then the Euclidean algorithm can be used to find an ’’nteger c such 
that 

a- c = 1 mod b (2.2) 

Let us assume that a and b are positive integers. Dividing a 
by b yields 




a = b-qi 

+ ri , ri < b 

(2.3) 


by 

def inition , 

<l=(a,b) 

i a or b . Therefore 

, if r =0, 

1 

b 1 a and 

(a. 

b)=b. If r 

+ 0. 

continuation of this 

procedure , 

the the 


following system of equations are obtained : 


ri •q2 

+ '2 . 

'2 < '‘I 

'‘I '<13 

+ ^2 * 

'3 < '2 


(2.4) 

rk-2 = ric-i-qjc + <rk-i 

•^k-l = ‘^k’lk+l 

Since r^ > r 2 > r 3 > .... the last remainder is zero. Thus by last 

equation, r |r . The preceding equation implies that r |r , 
k' k-1 k k-2 

since r |r Finally we obtain r lb and r la. Hence r is a 

k k-1 k k k 

divisor of a and b. 


An important consequence of Euclidean Algorithm is that the 
GCD of two integers a and b is a linear combination of a and b. 
this can be seen by rewriting (2.3) and (2.4) as 

ri = a - b-qi 
r2 = b - rj •q2 

(2.5) 


^k = '‘k-2 " '‘k-1 •‘Ik 


The first equation shows that is a linear combination of a and 



b. The second equation shows that r 2 is a linear coobination of b 
and ri and therefore of both a and b. Finally the last equation 
implies that rj^^ is a linear combination of a and b. Since 
rjj=(a,b), we have 

(a,b)=ma+nb (2.6) 

where m and n are intecers. When a and b are relatively prime, 
(2.6) reduces to Bezout’s relation 

1 = roa + nb (2.7) 

The value of m and n can be found from equation. (2.4) and (2.5). 
From equation (2.7) 

ma = 1 mod b (2.8) 

i.e., m is an inverse of a mod b. 

The following set of recursive equations can be used to find m and 
n : 

®k = «k-2 ■ <Jk+l'«k-l 

^'k = «k-2 “ <lk+l*«k-l (2.9) 

where 

r'k = rk-2 - rjc-i-qk 

m_2 = 1 > n_2 = 0 (2.10) 

m_j^ = 0 , *^-1 “ 1 

r_i = a , ro = b. 

The equations (2.9) are to be applied until rj^ = 1 . 

2 . 1 . lA Algorithm to find inverse : Given relatively prime 

integers a and b such that a ^ b, the following algorithm will 
find m,n satisfying (2.7) 

1. Set rj) = b and r-j =a. Set n _2 =1 and n _2 = 0 and m_i = 0 
and n_j=l . Set k=l. 



2. Divide rjj _2 by to obtain quotient qj^ and remainder 

rjj . Put 

®k-l = ®k-3 ‘ «Jk‘®k-2 
and njt-i = nk _3 - qk-nic _2 

3. I f r = 1 the aleorithm terminates with n = ®k-l 
n=nj^_l. Otherwise set k=k+l and return to step 2. 


2.1.2 Chinese Remainder Theorem 


In this section the problem of solving a set of simultaneous 
linear congruences with different moduli is discussed. The 
solution is given by the Chinese Remainder theorem(CRT) . This 
theorem has extensive applications in various signal processing 
algorithms. The theorem is stated here without the proof. 

Theorem 2.1 : Let m^ be k positive integers greater than 1 
and relatively prime in pairs. The set of linear congruences 

X = r^ mod mj, (2.11) 

has a unique solution modulo M, with M = tt mj^ . 

Ifathematically , given the residues rj’s, the solution of 
(2.11) can be found uniquely by 


k 

X = E ( M / mi )-ri-Ti 

i = l 
k 

= E Mi-Ni-ri 
i=l 


where 


IIi= H / mi 
Hi'Ni = 1 mod mi 

= 0 mod m , k^i 


( 2 . 12 ) 


(2.13) 



The the integer can be found by Euclid’s algorithin, as is 

the inverse of . 

2 . 1 . 2A Alfiorithn for Chinese Renainder Theoren : Given a set 

of relatively prime positive integers { , i=l,...k } greater 

than 1, this algorithm will compute and satisfying (2.13). 

k 

1. Compute M = TT m^ . 

i = l 

2. Set i = 1. 

4. Set Mi = M / mj^ . Use the Euclidean algorithm to find 
the inverse of Mj^ mod m^. Let the inverse be called 
Ni. 

4. If i i k then return to step 3. Otherwise the algorithm 
terminates . 

2.1.3 Primitive roots 

Consider a prime field GF(p) where p is a prime number. In 
GF(p) one can always find an integer a < p such that a^*^ ^^ = 1 and 

for any other integer j < (p-1) , a'^ ^ 1. The integer a is said to 

be having an order (p-1) and is called the primitive element in 
GF(p). Some of the important theorems of number theory which will 
be used later are stated in the following paragraphs. 

Definition : The number of integers that are smaller than m and 

relatively prime to m is denoted by 0(m) and called Euler totient 
function. 

Uhen m is prime, with m=p, all integers smaller than p are 
relatively prime to p. Thus 


0(P) = p-1 



c 

If ■ = p , the only numbers less than m and not prime with m are 
the multiples of p. Therefore 

acp*") = 

Theorem 2.2 If (a,m) = 1, then 

Theorem 2.3 If (e>m) = 1 and 6^ = 1 mod m, the order of the 
inte&er c must divide b. 

Theorem 2.4 If (fi.m) = 1, the order r of the integer g must divide 
0(m) . 

One of the important operation when working on finite fields is 
the search for a primitive element. For checking whether a the 
chosen integer n < p is primitive or not, one needs to check 
whether n** =1 or not, for every qj(p-l).The following is the 
algorithm for search of the primitive element. 

2 . 1 . 3A Algorithm for searching the primitive element : Given a 

prime number p, this algorithm will find the primitive root it in 
GF(p) such that 

=1 and 4= 1 , 0 < 1 < (p-1) 

1. Factorize (p-1) and store the factors in f [ i ] , i = l , . . . , m 
such that f[l]=2 and f(m]=Cp~l)* Set j =1. 

2. Set j=J+l and set k= j ; Set i=l. 

If k^:! then go to step 4. Otherwise, if i>m then the 
3 • 

algorithm terminates with i» = j else return to step 2. 

4. Set k=(j)^^^^ and set i=i+l. Return to step 3. 

The step 4 of the last algorithm involves integer exponentiation. 
For large values of prime p, the algorithm takes sufficiently long 
time. The next algorithm will consider fast exponentiation. 



2. 1 .3B 


Aleorithn for fast oxponentiation : Given positive 


integers a and b this algorithm will compute c such that 

b 

c = a 

1. Set c=l 

2. If b is odd then set c = c*a. 

3. Set a=a* and set b=b div 2. If b=0 then the algorithm 
terminates , else return to step 2. 

In the next section the Euclid’s algorithm and the Chinese 
Remainder theorem shall be extended for polynomials. The 
algorithms for the various algebraic operations on polynomials are 
given . 

Example 2.1; Consider p=17. To find out whether an integer k € 
GF(p) is a primitive root or not it is necessary to find it’s 
order. From Theorem 2.3 the order of k has to be a factor of CP“ 
1)=16. The possible orders are 2,4,8 and 16. 

8 

Let k=2. It is not primitive because 2 = 256 = 1 mod 17 and 

thus, the order of 2 is 8. 

Let k=3. 

k^ = 9 mod 17 = 9 
k^ = 81 mod 17 = 13 
k® = 169 mod 17 = 16 
k^^ = 256 mod 17 = 1 

Therefore, k is a primitive root in GF(17). The different powers 
of k can be computed using the fast exponentiation algorithm. 

In the next section the Euclid’s algorithm and the Chinese 
Remainder theorem are extended to polynomials. The algorithms for 
the various algebraic operations on polynomials are given. 



2.2 Polynomial Algabra 


Arithmetic on polynomials consists primarily of addition, 
subtraction, multiplication ; in some cases, further operations 
such as division, exponentiation, factoring, and taking the 
greatest common divisor are important. In this section a improved 
algorithm for division of polynomials over rational field is 
proposed. An algorithm for factorization of the polynomials of the 
form (x"-l) is also suggested. The treatment is starts with the 
simpler algorithms . 

The operation of addition and subtraction are straight 

forward. Ilult ipl ication of the polynomials is done by the rule, 

z* o r ^ s 

(Up'x + ... + uo)*(Va-x + ... + vq) = C w,.+g-x + ... + wq) 

where 

Wk = UQ-vic + ui-vjc-i + ... + ujc-i-vi + u^-vo 

In the later formula and vj are treated as zero if l>r or j<s. 
The division of polynomials is a relatively more difficult 
problem. 

2.2.1 Division of polynomials 

It is possible to divide one polynomial by another in 
essentially the same way that we divide one multi-precision 
integer by another, when arithmetic is done on polynomials over a 
field. The most important fields of coefficients that arise in 
applications are 

1. The real and complex numbers 

2. The rational numbers 

3. Integers modulo p where p is a prime. 


Given two polynomials u(x) and v(x) over a field, with 



v(x):|:0, u(x) Can be divided by v(x) to obtain a quotient 
polynomial q(x) and a remainder polynomial r(x) satisfying the 
conditions 

u(x) = q(x)-v(x) + rCx) C2.14) 

The following algorithm can be used to divide two polynomials over 
a field. The algorithm considers the most straight forward method. 
2.2.1A Algorithm for dividing polynomials over a field : Given 

polynomials u(x) and v(x) of degree m and n respectively, this 
algorithm finds the polynomial q(x) and r(x) satisfying (2.14) 

1. Do step 2 for k=m-n,ra-n-l , . . . , 0 ; then the algorithm 

terminates with ... ,ro) = CUn-l» ••• >'^0^ • 

2. Set Qjf = ( Ujj+ic / Vjj), and then set uj = uj - qk"''^j-k 

for j= n+k-l,n+k-2, ... ,k (The latter operation 
amounts to replacing u(x) by u(x) - qic'X •v(k), a 
polynomial of degree < n+k ) . 

Uhile the above algorithm works perfectly well for division of 
real polynomials , for polynomials over integer rings suitable 
modifications are required because of non - existence of an 
inverse. Of course, situations when an integer polynomial u(x) is 
perfectly divisible by integer polynomial v(x), and ,vhen v(x) is 
a monic polynomial the above algorithm continues to hold valid. 

An algorithm was proposed by KnuthtlS] to overcome the 
problem of division of integer polynomials. It can be seen that 
the above algorithm requires explicit division only by the leading 
coefficient of v(x), written as l(v), and that the step 2 is 
carried out exactly (m-n+1) times; thus if u(x) and v(x) start 
with integer coefficients, and if the algebraic structure over 



which the operations are done are rational nuaebers, then the only 
denominators that appear in the coefficients of qCx) and r(x) are 
divisors of l(v)*™ ^ This suggests that the polynomials q'(x) 
and r'(x) can be found such that 

ICv)®* = q’(x)-v(x) + r’(x) (2.15) 

where m=deB(u) and n=defi(v), for any polynomials uCx) and v(x)^0, 

provided that min. The values q(x) and r(x) can be found from 

q’Cx) and r’(x) as follows, 

q(x) = q’(x) / l(v)“ 

r(x) = r’(x) / l(v)"“"^^ (2.16) 

The integer divisors in the above expressions are stored as 

separate variable as the field of operation is rational. The above 
algorithm was adequate for cases when l(v) and (m-n) are small 

integers, for other cases the algorithm results in an early 

overflow. 

An algorithm is proposed by which such a problem could be 
avoided. For convenience of explanation, the content and primitive 
part of a polynomial are defined. The content of a polynomial u(x), 
abbreviated as cont(u), is the gcd of the coefficients of u(x),and 
the primitive part of u(x), abbreviated as pp(u), is u(x) /cont (u) . 
The following steps outline the algorithm. 

2. 2. IB Division of integer polynomials over rational field : 

Given polynomials u(x) and v(x) over integer ring, this algorithm 
will find q(x) and r(x) over rational field satisfying (2.14). 

1. Set num=cont(u) and den=cont(v) ; set v(x)=pp(v) and 
set u(x)=pp(u) , set t=l. 

2. Do step 3 for k = m-n,m-n-l 0 (The integer 

indicates the degree of the quotient polynomial); then 



CO to step 4. 

3. Set X = ficcl(l(u) ICv)) ; Do the following settings ; 

s= Vn / n , t =t-s , - Uf^+it / x, 

Uj = Uj*Vfj/x - '^n+k' j=n+k-l, ... 1,0 
Qi = Qi’Vjj/x for i=iB-n k+1 . 

4. Set u(x)= (num/t ) ■ u (x) and q(x)=(nu«/(t- den) )• q(x) ,the 
algorithm terminates with rCx)=uCx). 

In the above algorithm the division in step 3 is done over integer 
ring, but in step 4 the common multipliers and divisors are stored 
in different variables instead of actual multiplication and 
division . 

Example 2.2 : Consider polynomials u(x) and v(x) such that 

u(x) = X® + - 3x^ - 3x^ + 8x^ + 2x - 5 

=(1010-3-382-5) 
v(x) = 3x^ + 5x^ - 4x^ - 9x + 21 
=( 3 0 5 0 -4 -9 21) 

When using (2.15) and (2.16), u(x) is replaced by 
u(x) • 1 (v)®”*^^^ = u(x)*3^ = 27*u(x). So the new value of u(x) is 

u(x) = (27 0 27 0 -81 -81 216 54 -135) 

Now u(x) can be divided by v(x) to obtain 

q'(x) =(90 -6) 
r'(x) = (-15 050 -15) 

It is seen that the quotient and the remainder have a common 
factor 3. Also the starting value of u(x) has large coefficients 
even though the given polynomial has small coefficients. 

When using Algorithm 2. 2. IB for dividing, the resulting 
coefficients are of smaller magnitude, but the algorithm requires 



computation of gcd at every step. 


The values of u(x),q(x) and r(x) after each iteration are 
indicated in the table belov : 


v(x) =(3050-4-9 21) 
u(x) =(1010-3-382 -5) 


u(x) 

r(x) 

q(x) 

3 0 3 0 -9 -9 24 6 -15 

0-20-50 36 -15 

-6 0 -15 0 9 18 -45 

0-20-5036 -15 

-20-5036 -15 

0-5010-3 

1 

1 0 

3 0-2 


From , algorithm 2 . 2 . IB 

r(x) =(-5010 -3)/9 
and q(x) =(30 -2)/9 

2.2.2 Euclidean Algorithm for polynomials : In this section the 
set of equations for the Euclid's algorithm will be presented 
without dwelling into the the theory. 

Consider polynomials Il(x) and m(x) with integer coefficients 
such that they are relatively prime to each other ; if, the 
deg(n(x)) i deg(m(x)), then the Euclid’s algorithm can be used to 
find N(x) such that, 

H(x)-N(x) = 1 mod m(x) (2.17) 


The following set of recursive equations can be used to find R(x) 

NfcCx) = Nk_2(x) - qk+i(x)-Nk_l(x) (2.18) 

where 

rk(x) = ric_2(x) - ric-i(x) • qk(x) 

N_ 2 (x) = 1 . N_i(x) = 0 , (2.19) 




r_i(x) = a . ro(x) = b 

The equations (2.9) are to be applied until rj^Cx) = 1 . 

2.2.3 Chinese Kenainder Theorem over polynomial fields : The 
convolution algorithm can be alternatively stated as a product of 
polynomials. Therefore, if {s^} is obtained from a convolution of 
{g^} and {d^} then the operation can be written as 

s(x) = g(x)-d(x) mod mCx) (2.20) 

Suppose that m(x) is a product of k polynomials mi(x) having no 
common factors, i.e., 

k 

m(x) = Ti m£(x) (2.21) 

i = l 

Since each of these polynomials m^fx) is relatively prime with all 
the other polynomials m^fx) , it has an inverse modulo every other 
polynomial. This means that we can extend the Chinese Remainder 
Theorem to the ring of polynomials modulo m(x) and therefore 
express uniquely s(x) as a function of the polynomials sjfx) 
obtained by reducing a(x) modulo the various polynomials mi(x). 
The Chinese Remainder theorem is then expressed as 

k 

s(x) = E Si(x)*Mi(x)-Ni(x) mod m(x) (2.22) 
i = l 

where, for every value of u and i 

Hi(x) = m(x)/miCx) (2.23) 

and 

n (x)*N (x) 5 0 mod m (x), i^u 
u u i 

= 1 mod m^Cx) 

The validity of the above can be easily verified. To compute Ny(x) 



the Euclidean algorithre can be used. 

The residues s; (x) can be evaluated using the residues g^Cx) 
and d^Cx) by 

Si(x) = ei(x)-di(x) mod tti(x) (2-24) 

2 . 2 . 3A Algorithm for CRT over polynomial rings ; Given real 

polynomials, d(x) and g(x) and an integer polynomial ra(x), this 
algorithm will find s(x) satisfying (2.20) using (2.21) - (2.24). 

1. Factorize m(x), store the factors as mj[(x) , i = l,2 k 

2. For i = l to k , compute (x)=ra(x)/mi (x) . 

3. For i=l to k , use the Euclidean algorithm to compute 

Ni(x) such that 

Mi(x)-Ni(x) = 1 mod m^fx). 

4. Compute the residues d^fx) and g£(x). Evaluate s^fx) 
satisfying (2.24). 

5. Set s(x) = 0. 

6. Do step 7 for i=l to k , then the algorithm terminates. 

7. Set 8(x) = s(x) + (x) ' Ni (x) • Si (x) . 

Example 2.3 : Consider 

m(x) = x^ - 1 

= (x-D- (x+D- (x^-1) 

= rai(x)*m2(x)*m3(x) 

From (2.23) 

IIl(x) = ( x^ + x^ + X + 1) 

M2(x) = ( x^ - x^ + X - 1) 

M3(x) = ( x^ - 1) 

Using Euclidean algorithm the inverse Ni(x), i=l,2,3 satisfying 
(2.23) can be computed. From (2 . 17 ) , (2 . 18) and (2.19), 


Ni(x) = 1/4 



-1/4 


N2Cx) = 

N 3 CX) = - 1/2 . 

The operation which is done very often is finding the residues. In 
the next paragraph an algorithm is given for finding the residue 
when the modulo polynomial is roonic. This does not impose serious 
restriction, as for most of the signal processing applications it 
is monic. 

2 . 2 . 3B Algorithm for evaluation of residue polynomial when the 

modulo polynomial is monic : Given a real polynomial s(x) and an 
integer monic polynomial m(x) , this algorithm will find s’(x) such 
that 

s’(x) = s(x) mod m(x) 

1. If degCsCx)) < deg(m(x)), the algorithm terminates. 

2. Do step 3 for k = n-1 ,n-2 , . . . ,m ; the algorithm 
terminates with 8 ’(x)=s(x). 

3. Set Sj = Sj - sjj*Cj_ij+jn for j=k-l,k-2 k-m. 

2.2.4 Factorization of polynomials : Factorization of polynomials 
is not a simple problem. Efficient algorithms exist [13] for 
factorization of polynomials over finite field due to 
Berlekamp[ 16 ] . 

In this section the attention is restricted to the class of 
polynomials of the form x"-l. For most of the situations this is 
sufficient because polynomials of different form are used only for 
linear convolution. Under such circumstances , m(x) is decided on 
the basis of the factors , so that the residues Sj_(x) can be found 
with least effort. This implies that for those cases when linear 
convolution is desired, the factors of ra(x) are pre - determined. 



2.2.5 CyclotoBic polynomials : The polynomial x"-l can be vritten 
in terms of its prime factors 

x"-l = pi(x)-p 2 (x)-p 3 (x)- .... Picfx) (2.25) 

If the field over which factorization is done is the field of 
nationals, the prime factors are easy to find. These factors are 
called cyclotomic polynomials. 

For small n. The cyclotomic polynomials are easy to find by 
factorinfi x"-l. Clearly 

Ql(x) = x-1 

Few more polynomials shall be factored to see the general pattern. 
Let n=2 . Then 

x^-1 = (x-l)(x+l) = Qi(x)-Q 2 (x) 

Let n=3.Then 

x^-1 = (x-l)(x^+x+l) = Qi(x)-Q 2 Cx) 

Let n=4.Then 

x^-1 = (x-l)(x+l)(x^+l) = Qi(x)-Q 2 (x)-Q 4 (x) 

At each step, only one new polynomial occurs. The general case is 
described by the following theorems. The proofs of the theorems 
shall not be discussed. An interested reader may refer to 
Blahut[14] . 

Theorem 2.5 For each n 

degCQnCx)) = 0(n) (2.26) 

where 0(n) is the totient function. 

Theorem 2.6 For each n 

x"-l = u Qk(n) (2.28) 

k I n 

In the next paragraph an algorithm for the factorization of 
polynomials of the form x"-l is suggested. For signal processing 



appl icat ionc this operation can be done off-line and the factors 
can be stored at definite laemory locations. 

2 . 2 . 5A Alftorithn for factorization of polynoaialc of for» 

x”-l : Given a positive integer n ^ 1 , this algorithm will find 

the factors of x"-l over rational field. 

1. Let k be an array of integer and Factor an array of 
polynomials over integer ring ; Set ktl]=l and set 
Factortl]=(x-l) . Set 1=2 and j=l. 

I f 1 n then set 1 = 1 + 1; Set t(x)=x^-l. 

2 s 

3. for i=l to j do 

{ If k[ i ] 1 1 then do 

{ Set t(x) = t(x) / Factor[i] } 

> 

4 . Set j=j+l; Set k[j]=l and set Factor[ j )=t Cx) 

5. Set 1=1+1 ; if 1 S n return to step 2 otherwise the 
al£orithin terminates with Factor containing the factors 
of x"-l. 

In this chapter the basic mathematical tools needed for 
description of the various convolution and DFT algorithms were 
discussed. In the next chapter various convolution algorithms will 


be discussed. 



CHAPTER THREE 


DISCRETE CONVOLUTION 
3.1 Introduction 

The calculation of finite digital convolution 

N-1 

Vi = E (3.1) 

k=0 

has extensive applications in both the general - purpose computers 
and specially constructed digital processing devices. It is used 
to compute the auto - and cross - correlation functions, to design 
and implement finite impulse response and infinite impulse 
response digital filters, to solve difference equations, and to 
compute power spectra. 

Uhile the calculation of the convolution according to the 
defining formula (3.1) would require a number of multiplications 
and additions proportional to N* for large N, use of Fast Fourier 
Transform (FFT) has been able to reduce this to 0(N- logN) 
operations when N is a power of 2. To be more specific, consider 
the problem where hj^ , i= .... , -1,0,1,... is a periodic sequence 

of period N so that h^ = ^N+i* Then the discrete Fourier 

transf orn( DFT) 


X 


n 


(-2-1I* j-n-k/N) 

= E xjj • e 
k=0 


(3.2) 


has the property that the DFT’s Hji,Xj^ and Yjj, n=0 , 1 , 2 , . . . ,N-1 , of 
the three sequences hjc.xjc and yj^, k=0 , 1 , 2 , . . . ,N-1 , respectively. 



are related by 


Yj^ = Xn'Hn n = 0.1,2 N-1 (3.3) 

Since the FFT alftorithin enables one to calculate the DFT in 
0(N*1 o£(N)) operations, the entire convolution operation requires 
0(N-log(N)) operations. 

But still there remains few problems to be resolved before 
using the FFT algorithm for the computation of convolutions. One 
of the most important is when the hj^’s and x^’s are from the 
integer ring. The computationally efficient DFT method involves 
intermediate quantities, i.e., sines and cosines, which are 
irrational numbers, thereby making exact results impossible on a 
digital machine. Also, there is a need for doing complex 

arithmetic, even if the convolutions are real. 

To resolve the first problem Agarwal & Burrus[113 suggested 
number theoretic transforms to implement fast digital convolution. 
The transforms are defined on finite fields and rings of integers 
with arithmetic carried out modulo an integer. 

One more approach which does not require working in complex 
fields relies strongly on representation of convolution as 

polynomial multiplication and Chinese Remainder theorem. This 
approach was introduced in Chapter 2. Even though, the algorithm 
does not require operations on complex numbers, the computation of 
convolution requires a number of subsidiary operations like 
factorisation of modulo polynomial, computation of M^Cx) and 
(x) , evaluation of the residues , etc . , which reduces the 

efficiency of the algorithm. The Uinograd algorithm for convolving 
data sequences is based on the Chinese Remainder Theorem, and 

expresses the convolution operation as a sequence of pre 



additions, nultiplications and post additions. 

3.2 Uinograd Algorithm 

As already stated, the Uinograd algorithm is based on the 
Chinese remainder theorem. Therefore, for sake of convenience in 
understanding the algorithm the Chinese Remainder theorem is 
restated : 

If 

s(x) = g(x)*d(x) mod ra(x) (3.1) 

and the modulo polynomial mfx) is factorizable as 


k 

m(x) = Ti m^fx) (3.2) 

i = l 


then the problem of convolution can be broken as a sequence of 
smaller convolutions 

Si(x) = di(x)*gi(x) mod m^fx) (3.3) 

i = l,2 k 


where 


di(x) = d(x) mod m^fx) 

ei(x) = g(x) mod mi(x) 

The polynomial s(x) can be obtained 

Si(x),i=l,2 k by the Chinese Remainder 


(3.4) 

from its residues 
Theorem, given as 


k 

s(x) = E Si(x)*IIi(x)-Ni(x) 
i = l 

where 

Mi(x) = m(x)/mi(x) and 
Ili(x)-Ni(x) = 1 mod ffijfx) 


(3.5) 


(3.6) 


To see that the above set of equations actually do reduce 



the computational complexity , an example ie considered. 

Let the number of data points, N=8 , then 
m(x) = X - 1 

= (x-D- (x+D- Cx^ + 1)' (x^ + 1) 

= Qi(x)*02(x)-Q4(x)-Q8(x) 

= mi(x) • m2(x) • m3(x) • m4(x) 

Direct convolution would require N*=64 multiplications. Uhen using 
the Chinese Remainder Theorem the number multiplications would be 
1+1+4+16=22. Thus, by breaking the problem of convolution of large 
sequences into convolution of smaller sequences a reduction in the 
computational effort can be achieved. 

The coefficients of the product polynomial di(x)-gi(x) gives 
the values of the non-cyclic nj-point convolution of the 
coefficients of d^Cx) and g^Cx) [nj=deg(mj (x) )+l ] . Then according 
to discussion before, s^Cx) is the result of reducing this 
polynomial mod m^fx). It can be easily seen that dj[(x)'gi(x) can 
be computed by multiplying linear combinations of the coefficients 
of dj[(x) by the linear combinations of the coefficients of giCx). 
These coefficients are, in turn, linear combinations of d^’s and 
gj^'s, respectively. The set of products, so formed is, therefore, 
of the form 

S = (Bg)*(Ad) (3.7) 

where * denotes element by element multiplication. 

Substituting the s^Cx) in the CRT results in formulas for 
the s^’s as linear combinations of the above mentioned. Thus, one 
obtains the form 

8 = CS (3.8) 

Equations (3.7) and (3.8) give alternative scheme to convolve two 



sequences . 


To reduce the order of the A and B matrices, the Cook-Toom 
algorithm [14], other systematic procedures and even manual 
manipulation can be used. 

Example 3.1 : Consider 4 - point cyclic convolution of two real 

sequences { i = 0,l 4 } and {d^, i = 0,l 5}. In terms of 

polynomials whose coefficients are the sequences involved, this 
corresponds to 


s(x) = g(x)'d(x) mod (x^-1) 

As stated earlier in Example 2.3 

MiCx) = (x+l)(x^+l) ; Ni(x) = 1/4 

W 2 (x) = (x-l)(x^+l) ; N 2 (x) = -1/4 

MaCx) = (x^-1) ; N 3 (x) = -1/2 

The reduced polynomials 

di(x) = d(x) mod m£(x) 


are 

d^Cx) = do^ = do + di + d2 + d3 
2 

d2(x) = do = do - dj + d2 - d3 
d3(x) = do^ + di^ = Cdo - d2) + (d^ - d3)x 
The equations for 

fii(x) = g(x) mod m^Cx) 

are exactly the same form as those for dj^fx). The relation 

Si(x) = di(x)*gi(x) mod m^Cx) 




o3_.3 3^.3 3 

Si - do -£1 + dj -fio 

The calculation of S 3 (x) is exactly like complex multiplication 
and carried out as though x = V-l. The Cook - Toom algorithm can 
be used to compute sq^ and s^^, in 3 instead of 4 multiplications. 
The result is that it is necessary to compute the five products 

A ^ 1 

“O = «0 'fto 

2 2 
®1 = do 'fio 

,.3,3 3, 

®2 “ dp ‘(£0 + fil ) 

3 3 .3. 

®3 = £0 ■ (do - dj ) 

®4 = (do^ + dj^) 

In terms of these, the s^'^’s are 

1 

SQ = »0 
2 

SQ = mj 

3 

SQ ' ®2 “ ®4 
3 

®1 = ®2 ~ ®3 

The polynomial s^fx) whose coefficients are given above are then 
substituted in the Chinese Remainder theorem 

3 

s(x) = E Si(x)-MiCx)-Ni(x) 
i = l 

to give the final result, 

SQ ~ C®0 ®i)/4 + (®2 " ®4)/2 

sj = (ag ~ »i)/4 + (®2 “ »3)/2 

^2 “ C®o ®l)/4 “ C ®2 ~ ® 4)/2 

S3 = (ag - ffli )/4 - (ra2 - n3)/2 

If g£’s are the filter coefficients then they can be assumed 
to be fixed and used repeatedly for many d^ sequences. 



Accordinfily, the computation can be simplified by redefining the 
■jj's and combining the 1/2 and 1/4 factors with the gj['8. In terms 
of matrices the algorithm can written as, 


1111 


1-1 1-1 


1 1 - 1-1 


10-10 


0 10-1 


1111 


1-1 1-1 


2 0-20 


(1/4) 


2 - 2-2 2 


2 2 - 2-2 


G = Be 


and D = Ad 


Compute S such that 

S = G* D 

where • indicates component wise multiplication. 
Also , 


1110-1 


1-1 1-1 0 


11-101 


1 - 1-1 1 0 



Enter 


Enter 


Frequency Domain 


Uinofirad 


Convolution 


Convolution 



Exit Exit 


COMPARISON 0*= TWO OONVCLUrjOW M&THObi 









The result can be obtained by 
s = CS 

One of the important implication of representing convolution 
in terms of matrices, is its similarity to the DFT approach 
Fi&ure 1 gives an instructive comparison of a Uinograd convolution 
algorithm and a convolution computed by using a discrete Fourier 
transform. The transform, in this case, is unlike the DFT, is not a 
square transform but instead a rectangular transformation. The 
transform notation of the Uinograd algorithm has important 

consequences in multi - dimensional convolution algorithms. 

3.3 lIultidimen8ional_Convolutions 

3.3.1 Introduction 

This section is intended to familiarize the reader with 
the multidimensional convolution. The treatment is elementary and 
for a detailed study the reader is advised to refer to Blahut[20]. 

The convolution of two multidimensional sequences 

gCnj , . . . , n^.) , d(ni n^.) over a field is described by 

s(kj^,k2» • • • »k^) E gC^i»^2» • • • . . . ,k^—npl 

(3.9) 

The field F can be field of real numbers R, the field of complex 
numbers C, of any other finite or infinite field. 

An equivalent definition may be given in terms of the 
polynomial form of the above sequences 

s(xi, . . . ,Xj.) = d(xi Xj.) • g(xi , . . . ,Xj.) (3.10) 

For cyclic convolution (3.10) becomes 



s(xi , . . . ,Xf.) = d(xi Xj.)fi(xi Xj.) (3.11) 

j / N1 , . . Nr , . 

mod (xi -1 , (Xf. -1) 

The next section describes an algorithm for multidimensional 
convolution . 

3.3.2 Nested Convolution Alcorithm : For ease of analysis. only 
two - dimensional case will be considered. For convolutions of 
higher dimensions the following treatment can be easily extended. 

The two - dimensional cyclic convolution can be also written 
in terms of polynomials. Thus 

f w 

8(x,y) = g(x, y) • d(x, y) (mod x" -l)(mod x" -1) 

(3.12) 

where n' and n” need not be equal. A two-dimensional cyclic 
convolution satisfies the cyclic convolution theorem. If D and G 
are the two dimensional Fourier transforms of d and g, 
respectively, and ~ ®k’ .k" ’ ,k" » then S is the two- 

dimensional Fourier transform of s. Hence a two-dimensional cyclic 
convolution can be computed by using two-dimensional FFT 
algorithm. One can also use direct methods. 

The two-dimensional convolution may be easier to understand 
when it is written as a convolution of polynomials. It can easily 
be seen that a two-dimensional convolution can be written as 

Si'(y) = ^ fi(i’-k’ )^y^ (3.13) 

The convolution algorithms discussed in the previous sections, 
though derived in an arbitrary field, actually are valid in some 
rings, such as polynomial rings. Hence the convolution of 
polynomials can be computed by any of these fast algorithms. 
Addition in the fast algorithm become addition of polynomials, and 



multiplications become multiplications of polynomials. Those 
multiplications of polynomials are themselves convolution and in 
turn can be computed by any fast alfiorithm. Thus one vay to 
compute a two-dimensional convolution is obtained by nesting a 
fast algorithm for one-dimensional convolution inside another fast 
algorithm for one - dimensional convolution. The next paragraph 
presents an algorithm for fast two-dimensional convolution based 
on the nesting algorithm. 

3 . 3 . 2A Algorithm for two-dimensional convolution : Given two 

N’ and N” array 

d = { di- jr., i '=0,1 , . . . ,N' -1 ;i” = 0, 1 , . . . ,N”-1 } 

g = { gi',i», i’=0,l N’-l ;i'’ = 0,l, . . . .N”-! } 

the algorithm computes a new array 

s = < s^. i’=0,l N’-l ;i’' = 0,l N”-! } 

satisfying 


N’-l N"-! 

Si',i- = E E fi(i’-k’),Ci”-k’')-dk’ ,k” 

k’=0 k’' = 0 

i’ = 0,1,2, . . . ,N’-1 
i" = 0,1,2 N”-! 

1. Multiply every row of d by matrix A" and every row of g 
by the matrix B" . 

2. Multiply every column of the transformed d array by A' 
and every column of the transformed g array by B*. Let 
the resulting arrays be D and G respectively. 

3. Multiply D and G component wise. Let the resulting array be 
called S. 

4 . Multiply every column of S by C’ and then multiply 



every row by C" . The resulting array will contain the 
elements of s. 

In the above algorithm, the transformation was first done along 
the rows and then along the columns. Alternatively it can be done 
along the columns first and then along the rows. The number of 
multiplications would be the sane for both the implementations but 
the number of additions would differ. To minimize the number of 
additions, it has been shown [13], the transformation should be 
done first along the dimension for which MCn^ l-nj/ACni) is 
minimum, where MCn^ ) and Afn^) is the number of multiplications 
and additions respectively for a convolution of length . 

3.3.3 Agarwal Cooley Convolution Algorithm 

For large convolutions, the algorithms derived from the 
interpolation methods become complicated and /efficient. The 
Agarwal-Cooley algorithm greatly simplifies the computation of 
long cyclic convolution by using a one-dimensional to 
multidimensional mapping suggested by Good[3,15], combined with 
the nesting approach proposed by Agarwal and Cooley[12]. 

In this section after an introductory treatment of the 
Agarwal Cooley algorithm, a fast algorithm is implemented for 
one-dimensional convolution of length N where N = ni'n2* ••• 
and n^'s are relatively prime. The transformation is done in place 
without actually transforming the data into multi - dimensional 
form. 

Consider again, the problem of computing the cyclic 


convolution 



(3.14) 


N-1 

Si = E d(i ’-k’ )• 6k 
k=0 

where N is a composite number 

N = ni-n2 


(3.15) 


with mutually prime factors nj and n2. This permits to define the 
one - to - one mapping 


i s (il.i2^ 


(3. 16) 


where ij and i 2 are defined by the congruence relations 


11 = i mod n^ , 

1 2 “ i mod n 2 


0 U ii S n- 


0 S i2 S n2 


(3.17) 


The CRT gives a unique solution i to the congruences (3.17) which 


is given by 


where 


i = ii-Mi-Ni + i2-M2*N2 0 S i < N (3.18) 


«1 = N / ni 
M2 = N / n2 


(3.19) 


Ml-Ni = 1 mod ni 
^2*^2 ~ ^ mod n 2 


(3.20) 


Substituting (3.18) for i and a similar expression for k in terms 
of (ki,k 2 ), the convolution (3.14) can be written 

n2-l ni-1 

eil,i2 = ^ ^ <l(il-kl), (i2-k2)'6kl,k2 

k2=0 k-i=0 

(3.21) 

where the indices of dii^i2 understood to be taken mod ni and 

n 2 respectively. Once the data has been converted into a two - 


dimensional form the algorithm for fast two 


dimensional 


convolution discussed earlier can be used. The result is converted 



back to one - dimension by using (3.18). To minimize the number of 
multiplications the following should be true : 

1. Multiplication with the pre - addition matrix A and B 
should be done first along the dimension for which 

[ M(n) - n]/A(n) is the least, where M(n) is the number 
of multiplications for n - point convolution and A(n) 

is the number of additions ; n is the number of data 

points along that dimension. 

2. The C operators should be placed in the reverse order 
to that used for A and B operator. 

The above treatment can be easily generalized for the 

multidimensional case. 

3.3.4 General Multifactor Algorithm for the Agarwal Cooley 
Algorithm 

In this section an algorithm is suggested for convolution of 
sequences of length N where N can be factored into r relatively 
prime factors. The algorithm is based on the Agarwal - Cooley 
algorithm for transforming the one - dimensional data into 

multidimensional form and the Vinograd’s algorithm for computing 
smaller convolutions. One of the important features of the 
algorithm is that, in contrast with the algorithms available in 
the literature, it does not require generation of the A,B and C 
matrices, thereby, in effect, reduces the memory requirement of 
the algorithm. 

3.3.4A Algorithm for large convolution : Let d.g be vectors of 

size N, this algorithm computes the vector s such that 



(3. 22) 


K- i 

Si = ^ 
k = 0 

1. Factorize N i-'to relatively prime factors n^’s such 
that 

N = nj^'n2' nj. (3.23) 

2. Choose the mixed radix number system (rij^ , . . . , n 2 , ) . 

The number in this radix number system is given by 

r 

i’ = E weight[k3 - itc (3.24) 

k=l 

where 

weight [13 =1 & 

(3.25) 

weight[k3 = weight [k- 1 3 ■ nj 5 ^_l k=2,...,r 

3. For i = 0 to N-1 find the residues i}^ , k=l,2,...,r such 
that 

ijt = i mod mj^ 

Use the equation (3.24) to compute i’. Scramble the 
elements of the input vector such that the ith element 
becomes i’th element. 

4. Set highrault=N , highorder=l, loworder=l and i=0. 

5. Set i=i+l;If i>r then go to step 10 else continue. 

6. Set loworder=loworder/ni . 

7. Repeat step 8 for highbit=0 to highorder-l , later go to 
step 5 . 

8. Repeat step 9 for lowbit=0 to loworder-1, later Jump to 
step 7 . 

9. for element=0 to n^-l compute 

pos=highmult*highbit + loworder* el ement + lowbi t . 

Set buf fl [ element 3=d[pos 3 and set buf f 2 [ el ement 3 =fi[ pos 3 . 



Transfor* the elements of the buffi and buff2 vector 


through Uinograd transformation. The new size of the 
Buffi and Buff2 vector is flCn^lxl. 

10. Set highiBult=highiBult /n^ ‘MCn^ ) . 

11. Put the transformed elements back in a new vector using 
the radix system as (B(ni ) , . . , MCnj ) , n £ + 1 , . . .n^.) . 

for element=0 to n^-l compute 

pos=highmult*highbit+loworder*element+lowbit . 

Set d ' tpos] =buf f 1 [ el ement ] and set g ’ [pos ] =buf f 2 [ el ement ] 

12. Set d=d’ and g=g’ . 

13. Set highorder=highorder *ll(ni ) . 

After the operations the new radix system would be 

CK(ni),n(n2) nCnp)). The result vector would be of 

size M(ni) •I1(n2) • . . . -MCnj.) . Multiply the components of 
the resulting vectors component wise. 

14. Carry out the inverse transformation similar to the 
forward transformation. 

15. Permute the elements of the result to obtain the 
elements . 

In the above algorithm the steps which require further analysis is 
the step 1, when the N is factored, and step 15 when the results 
are permuted back in natural order. 

3.3.4B Algorithm for factorization into primes : Given a 

positive integer n, this algorithm finds the prime factors 

Pl,P2»-*-*Pt n and also finds integers ni,n 2 n^. such that 

ni’s are relatively prime such that 

n = Pi*P 2'P3* ••• Pt (3.26) 

also n = ni-n 2 '...'nn where ni’s are relatively prime. 



Each of the factor iij can be expressed as 


ni = p'i (3.27) 

where each of the primes p’£ are distinct. The method makes use of 
an auxiliary sequence of "trial divisors" 

2=do<di<d2< 

which includes all prime numbers i Vn. 

1. Set t=0 , k=0, and let 1 be the number of prime numbers 
smaller than Vn. Set m=l.Set njB=l . 

2. If n=l, the algorithm terminates. 

3. Set r = n mod dj^ 

If r^O , go to step 7. 

5. Increase t by 1 , set Pt“<Ik 

• djj . 

6. If r = n mod dj^ =0 then return to step 5. Otherwise set 
m=ra+l and njj=l . 

7. If k < 1, then increase k by 1 and return to step 3. 

8. Set t by l.set pt=n, and set nj„=n ; termina.te the 
algorithm. 

3.3.4C Algorithm for permuting the results in natural order : 

The major steps in the algorithm are 

1. Converting the number to the mixed-radix 
representation . 

2. Use of the Chinese Remainder Theorem to obtain the 
solution of linear congruences. 

1. Set i=0. 

2. Set t=N ; for i=l to m do the following steps : 



N = nixn2 = 2x3-6 


Scrambling 


Compute 

- i mod nj 


i 2 ~ i mod n 2 

Find 

i’ = 3ii 4 i2 

Permut e 

the input vectors g and d such that 

Xi 

• = 


Pre addition 

Do the operation of pre addition 

on the 

scrambl ed 

d and g vectors along ij . Then 

do pre 

addition 

along i 2 . 



The resulting vectors are of size 

MCni)- 

M(n2) = 8. 

Let the resulting vector be 

called 

G and D 

respectively. 




Multiplication 

Multiply D ang G component wise and the result be 
called S. 


Post addition 

Do post additions along i2 and then along ij^ on 
the S vector. The resulting vector, s, will be of 
length N=6 . 


Unscrambl ing 

Permute the elements such that 


X! = X! • 


2. 


HULTIFACTOfi. CoNVOLUflOW 






1. Set k=l and set 1-m. 

2 . Set t = t /nj^ , 

3. Set veight[k]=t 

4. Set k=k*fl and set 1 = if 1 ^ 1 return to 

step 2; otherwise go out of the loop. 

3. Use the Chinese Remainder theorem discussed in Chapter 
2 to find the coefficients and such that 

i = E Mk-Nk-ifc 
k=0 

Lot ^k " ^k 

4. Do step 5 for i * 0 to N-1 then the algorithm 
terminates with the result in the buffer array. 

5. Set j'=0 ; for k=in downto 1 do the following steps. 

a. Set t = i rood nj^. Set j = j div nj^ . 

b. Set j’ = j’ + Sjf. 

Put the element at the ith position at the j’th 
position in a buffer array. 

Example 3.2 : Consider cyclic convolution of sequences of length 
N=6. It can be factored as 

N=mixm2=2x3 
Ml = 3 , W2 = 2 
Ni = 1 , N2 = 2 


The different operations are suianarized in Figure 2. 



CHAPTER FOUR 


DISCRETE FOURIER TRANSFORM 


4.1 Discrete Fourier Transform 
The DFT is defined by 


N-1 

Xj. = Z Xk exp(-2-n jrk/N) C4.1) 

k=0 

where X^. is the r co-efficient of the DFT and x^ denotes the k^ 
sample of the time series which consists of N samples. The 
can - be complex numbers and Xj. ’ s are almost always complex. For 
notational convenience (4.1) is often written as 


X^ = 


N-1 

E 

k=0 


IT rk 
Xk % 


(4.2) 


where 

Uh = exp(-2'nj/N) 

There exists the usual inverse of the DFT and, because the form is 
very similar to that of the DFT, the same alfiorithm can be used to 
compute it. The inverse of (4.2) is 

N-1 I 

XI = (1/N) E Xj. (4.3) 

r=0 

From (4.2), we can see that calculatinc all the N-DFT coefficients 
would require N* multiplications, which is unacceptable for large 
problems ; fast algorithms are needed. 


4.2 Cooley Tukey Algorithm 

This section describes an algorithm for computing the 
Fast Fourier transform based on a method proposed by Cooley and 
Tukey. As in their algorithm, the dimension n of the transform is 
factored (if possible) and n/p elementary transforms of dimension 
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p are done for each factor p of n. The fast Fourier transform, 
when computed in place, requires a final permutation step to 
arrange the results in normal order. 

4.2.1 Theory 

From (4.2), we have 


X,. = 


N-1 

E 

k=0 


,i rk 
Xk % 


(4.4) 


Let N=ni-n 2 and let the indices 

r = ni'r 2 + rj^ 
and k = n 2 ‘k]^ ^2 

Then we can write (4.4) as 


r and k be expressed as 

(4.5) 


^r2,rl 


n2-l 

E 

k2=0 


^ „ (nl- r2+rl)' (n2-kl+k2) 

^ k2‘WN 

ki=0 


2 ^ r2k2 „ rlk2 
2 Un2 '% 
k2=0 


(4.6) 

„ rlkl 
2 3'kl,k2'“nl 

ki=0 


From the above equation, it is easy to see that Cooley-Tukey FFT 

can be visualized as a mapping of one - dimensional array into 

two - dimensional array as shown in Figure 3 for N=15. The 

computation consists of an nj - point discrete Fourier transform 

on each column, followed by an element - by - element complex 

rlk2 

multiplication throughout the new array by Ujj , followed by n 2 - 
point discrete Fourier transform on each row. 

Equations (4.5) and (4.6) can be generalized for the 
roul t idiroensional case. The Fourier transform along the most 
significant digit in (4.5) should be done first and transform 
along the least significant at the end. This is evident froro 



C,4.5), in which kj is the roost sifinificant digit and in (4.6) 
Fourier transform along kj is done first. From equations (4.5) and 
(4.6) it is clear that if the Fourier transform is computed in 
place, i.e. the transformed samples are put back at the same 
position as the input elements themselves, then the result would 
be in bit reversed order. This is because the mixed - radix number 
system for the input and output index are reverse of each other. 
Therefore to obtain the output sequence in normal order, 
permutation operation is required. 

Algebraically, if N = ni-n 2 'n 3 ’ ... • n^ (4.7) 

then 


k = n2' n3' . . .n^- ki + .... + nm-km-i + kjn 

(4.8) 

r = ni-n 2 - . . . -nm-i-rm + .... +ni-r 2 + ri 

(4.9) 


Uhile taking the Fourier transform take the nj - point Fourier 
transform first and the n^^ - point Fourier transform last. Uhen 
doing the permutation operation put the element at location j, 
where 

j = n 2 -n 3 - . . .nn,- ji + + 3^-1 + J’m 

(4.10) 


at location j’, where 

j' = ni-n2- . . . -nm-i* jfli + +ni'j2 + Ji 

(4.11) 

Example 4.1 Let us consider a sequence of length N = 15. 

It can be factored as N = n^ x n 2 = 3 x 5 

The mixed radix representation of the input index i is be given as 

i = (ij , i2) = 5ii + i2 , il = 0,1,2 

i2 = 0,1, 2, 3, 4 



To compute the discrete Fourier transform it is neccessary to 


first do a 3 - point DFT. The index of different elements to 
obtained by changing ij vhile keeping 12 fixed. The Fourier 
transform of the three elements chosen is taken and the result is 
put back at the same position. The sampe operation is done for all 
values of 12 from 0 to 4 . 

In the second stage of operation requires computation of 5- 
point Fourier transform. As the Fourier transform is done along 
12 , which is the least significant digit, the samples whose DFT is 
desired are at consecutive positions. 

For unscrambling the resulting sequence the ith element 
should be shifted to i’th position, where 

i = 5ii + i2 » i-l ~ 0,1,2 

i2 “ 0,1, 2, 3, 4 

and i’= 3i2 + ii • 

4.3 Radix - 2 FFT 

This section is concerned with the development of a 
variation of the algorithm that is better adapted to parallel 
processing in a special purpose machine. The work is based on the 
algorithm proposed by Pease[8]. 

For the purpose of parallel processing in a special purpose 
machine, we require that the operations be organized in a few 
levels, each of which involves a set of elementary operations that 
can be done simultaneously. Preferably each level should involve 
only a single type of elementary process, so that the entire level 
can be initiated by a single command. Also there should be as few 
as possible distinct types of elementary operations, so that the 



parallel capability required shall be as simple as possible. 


For 


example, in the orifiinal algorithm each ”pass” involves a 
different grouping of the data into pairs. If possible, it is 
preferable to use only a single pattern of pairings which could 
then be "wired in”. 

4.3.1 Fast Fourier Transform 



0 0 0 

0 12 

0 2 4 

T = 0 3 6 


0 (N-1) 2(N-1) 


0 0 

3 N-1 

6 2(N-1) 

(4.14) 

9 3CN-1) 


(N-1)2 



In this notation, multiplication of entries becomes addition. The 


expression for T can be further reduced by noting that each term 
may be replaced by its principal value ( between 0 to N-1 ), 
modulo N. 

The transformation used by Cooley and Tukey accomplishes the 
same thing as T, except that the output is obtained in permuted 
order - specifically in digit reversed order, where the rows are 
numbered from 0 to N-1, as expressed in the base of the prime 
factor being used C in the present case, in binary notation). The 
transformation of this form is given, in general, by T’=Q*T, where 
Q is the appropriate permutation matrix. 

The matrix T’ can be obtained directly from T by permuting 
the rows of T as specified. For example, for N=8, after reducing 
all the terms modulo 8, we find that T is reduced exponent 
notation is given by 


T 


0 0 
0 1 
0 2 
0 3 

0 4 

0 5 

0 6 
0 7 


0 0 
2 3 

4 6 

6 1 
0 4 

2 7 

4 2 

6 5 


0 0 
4 5 

0 2 
4 7 

0 4 

4 1 

0 6 
4 3 


0 0 
6 7 

4 6 

2 5 

0 4 

6 3 

4 2 

2 1 


000 

001 

010 

Oil 

(4.15) 

100 

101 

110 

111 


where the binary row designation appears on the right. Then the 
modified transform is given by T’ as follows. 



00000000 


000 


04040404 

02460246 

06420642 

01230123 

05274163 

03614725 

07654321 


100 

010 

110 

C4.16) 

001 

101 

Oil 

111 


The key to the fast Fourier transform lies in the fact that Tjj' 
(when N = 2*^) can be first partition and then factored : 



where, in exponent notation, K = diafiC 0 1 2 3 ... N-1 ) and 0 

indicates the null matrix of appropriate dimension. 

The minus sien that occurs in (4.17) applies to the values 

of the sub - matrices that follow it, and not to the exponent 

coefficient that may actually be used. Since w is the nth root of 
M/2 

unity, w is a square root of unity or the value (-1). Hence the 
minus si£n can be translated to exponent notation by adding N/2 to 



the exponent coefficients involved. 

It should also be noted that Tjj' is written in terms of v, 

the principal Nth root of unity. Tn/ 2’ vould normally be written 

. 2 

in terms of principal (N/2)nd root, which is w . Hence, if 

equation (4.17) is actually written out, the entries in Tjj/ 2 ’ will 

be twice those that would occur were it written in terms of the 

(N/ 2 )nd root . 

The algorithm can now be iterated by applying the 

factorization of Equation (4.17) to each occurrence of Tn/ 2'» 
so on. If the process is carried to completion the Fast Fourier 
Transform is obtained. 

For the purpose of parallelization, the process given 
suffers from the defect that a coefficient in a given location is 
combined with the coefficients in different location each time the 
sum and difference are formed. For example, if equation( 4 . 17 ) is 
carried over to next stage of iteration. 




The last factor, which is the first operator applied to the data, 
forms the sum and differences of coefficients in the first half 
with the corresponding terms in the second. The third factor forms 
the sum and differences of the coefficients in the first quarter 
with corresponding terms in the second, and the coefficients in 
the third quarter, with those in the fourth. This prevents "wired- 
in" data paths, or else it requires an excessive number of such 
paths . 

Therefore, there is a need to modify the algorithm to avoid 
this difficulty. 

4.3.2 Modified Fast Fourier Transform 

The recursion formula of equation (4.17) can be written in 
the form T'n = CT’n/2 x I 2 )DnCIn/ 2 xT 2 ’ ) (4.18) 

where Dfj is the NxN diagonal matrix, quasidiagd K) , and T '2 is 
given by 



where in the later matrix the coefficients are the actual values, 
not exponents. The multiplication sign in (4.18) indicates the 
direct, or Kronekar product of the sub - matrices, Indexed first 
on the Indices of the first component and then on the indices of 
the second. For example, if A = ( a^j ), B = ( b^h with all the 
indices being either 0 or 1 , then 



A X B 


ab ab ab ab 


00 00 

01 00 

00 01 

01 01 

b 

10 00 

a b 

11 00 

a b 

10 01 

a b 

11 01 

b 

00 10 

a b 

01 10 

a b 

00 11 

a b 

01 11 

b 

10 10 

a b 

11 10 

a b 

10 11 

a b 

11 11 


The Kronekar product can be combined with matrix multiplication 
throufih the formula, 

(A X B3CC X D3 = (AC) x (BD) 

provided that the dimensions of the matrices are compatible. In 
particular, it can written that 

CAB X I) = (AB) X = (A X I)(B x I) 
or, more fienerally, if A,B.C,... are all kxk matrices, then 

(ABC ) X I = (Ax I)(Bx I)(Cx I) ... 

Using this formula, we can apply eqn(4.18) iteratively and obtain, 
when N=2”, 


T’n 


C T' 2 X I 2 X . . . . X I 2 )' Qi" 

C I 2 X T’2 X X I 2 )'Q2- 

(4.20) 


. c I2X I2X .... xT ’2 ) 

where each of the cross - products includes n factors, of which 
(n-l) are 2x2 identity and one is T’ 2 . and the are the diagonal 



matrices. For example, for N=8, T’s ~ (T’ 4 x 12)03(14 x I2). 
since I4 = I2 x I2 and 

T’4 = (T’2 X I2)D4(I2 x T’ 2 ) 

we find that 

T'a = ( [CT’2 X I 2 )D 4 CI 2 x T'2)] x 12)03(14 x I2) 

= ((T ’2 X I2 X l2)(04 X I 2 )(l 2 x I2 x T’2))08(l2 x I2 x T’2) 

( 4 . 21 ) 

= ((T ’2 X I 2 X l2)Ql(l2 X I 2 X T’2))Q2Cl2 x I 2 x T’ 2 ) 

where Qi = D4 x 1 2 , Q2 = Os . 

For the purpose of parallelization the difficulty now with 
©qnC 4 . 20 ) is that T'2 occurs in different locations in the various 
factors. There exists a permutation operator, P, such that 

P(T’2 X An/2)P"^ = An/2 * T’2 ( 4 . 22 ) 

If so, we can write each of the product terms in ( 4 . 20 ) in terms 
of a sinfile one of them, say the first 


C = T’ 2 X I2 X I2 X I2 X . . . x I2 
= T’2 X In/2 


( 4 . 23 ) 


By considering the behavior of the product operators on an 
arbitrary vector, a possible P is the "ideal schuffle", in which 

T 

p-[xo xj ... Xn/2-1 xn/2 xn-1 ] 

T 

= [XQ xn/ 2 xi Xn/2+1 xn/2-1 xn_i ] 


( 4 . 24 ) 



or, for N=8,for example 


10000000 
00001000 
01000000 
00000100 

(4.25) 

00100000 
00000010 
00010000 
00000001 

where the entries, in this case, are the actual values, not 
exponents . 

With this P, 

l2 X T’2 X lN/4 = PCT'2 In/2)P’^ = PCP"^ 

l4 X T’2 X In/8 = P^CP“^ 

where C is defined in (4.23). T^'en (4.20) becomes 

T’n = CEiPCP"^E2P^CP"^E3. . .P”“^CP“^^“^^ (4.26) 

E’i and E” i are now defined by 

Ei = = P^E”iP“^ 

which permits to pass the diagonal operator through the 
permutations. The operators E’i and E'’i will also be diagonal , 
but with the values on the diagonal permuted. Using these 
transformations and noting that, since P^ = 1 , P = P, (4.26) 



becomes either 



T’n = CE’iPCE’2PCE’3 PCP 


C4.27) 


or 

T’n = CPE”iCPE”2CPE”3 CP C4.28) 

The E’i and E”i matrices are diagonal and have the effect of 
multiplying selected items of the data set by various powers of w. 
Each of the matrices can be further factorized into diagonal 
operators, each of which multiplies a selected subset of the data 
set by a common multiplier which is a power of w. For example, for 
N=16, E ’ 1 is given by 

E’l = diagC 0004020601050307) 

= F'l F’2 F’4 

where 

F*i = diagC 0000000001010101) 

F'2 = diagC 0000020200000202) 

F ’4 = diagC 0004000400040004) 

in exponent notation. In particular, each E’i or B”i can be 

factored into the product of F’(2 ) ^”C2 )» F’ and F” 

2 

multiplies exactly N/4 of the data by w 

The alternative factorizations for N=16 are given by 
F ”4 = diagC 0000000004040404) 

F ”2 = diagC 0000000000220022) 

F”i = diagC OOOOOOOOOOOOllll) 

Uith these factorization of the E - operators (4.27) and C4.28) 
becomes 


T’16 = CF-(4)F'C2)F'ci)PCF’c4)F’(2)PCF’(4fCP 

4 - 2 V ) 



Since the diagonal matrices commute the subsequence of the F 
operators can be permuted as convenient. 

The general rules for the F - matrices are as follows : Ue 
number the positions along the diagonal from 0 to (N-1), and 
express these numbers in binary form. Then, in F’(2-)» "the 
exponent coefficients are 2*” in those positions in which the 
(m+Dst and the last digit are 1. All other coefficients are 0. 
Similarly, in general, the F”(2 ) matrix has the exponent 

coefficient 2^' in the positions for which, in the binary 
representation, the first and the (m+2)nd digits are 1. 

The final results, given in (4.27) and (4.28), can be 
written as 

n 

T’n= 2"’ = TT (CE’mP) (4.31) 

trt m 1 

= fr (CPE”m) (4.32) 

mat 

where E’j, and E”in are the diagonal matrices defined above, and 
E’n “ E”n ~ In a parallel processor, each factor represents one 
''pass”, pass being defined as a single sequence of parallel 
processing operations involving, in (4.31), first a permutation of 
data set; second, the multiplication of the selected elements of 
the data set by appropriate powers of w; and finally replacement 
of the data set by the sum and differences of adjacent elements. 

4.4 Good Thomas Algorithm 
4.4.1 Theory 

The second type of FFT is the Good Thomas algorithm. The 
length of the data sequence is factored into relatively prime 



factors. It is more difficult conceptually but like Cooley 
Tukey, it relies on transformation of uni-dimensional data into 
multi - dimensional data. Uhen used in conjunction with Cooley 
Tukey it can lead to more efficient algorithm. The Good - Thomas 
algorithm rearranges the input one-dimensional data such that the 
transform can be done by multidimensional transform without the 
intervening "twiddle factors”. 

The derivation of the Good-Thoroas FFT algorithm is based on 
the Chinese Remainder theorem for integers. The input index is 
described by its residues as follows : 

' i” = i (mod n”) C4.33j 

i’ = i (mod n’) C4.34) 


By Chinese Remainder Theorem there exists integers N' and N” such 
that the input index can be recovered as follows : 

i = i’N’n” + i”N”n’ C4.35D 

where the integers N’ and N” are the integers that satisfy 

N’n” + N”n’ = 1 C4.36) 

The output index is described somewhat differently. Define 


k’ = N'k mod n' 
k” = N”k mod n” 

the output index k can be recovered as follows : 

k = n”k’ + n'k” (mod n) 


(4.37) 


(4.38) 


Substituting the expressions for the input and output indices 


in 



the DFT formula, 


^k’ ,k” 




i’k’ i”k” 
(3 T 


Vi 


(4.39) 


where 3 and t are actually the primary n’th and the n”th root of 
unity needed for the n’-point Fourier transform and the n”-point 
Fourier transform respectively. 

The Good-Thomas al£orithm can be applied as lonfi as the 
factors are relatively prime. Any further factorization is not 
permissible. To break the problem further there is a need to use 
the Cooley - Tukey Algorithm along with the Good - Thomas 
algorithm for finding FFT for power of a prime. 

The above description leads to the following algorithm : 

4 . 4. <A Good Thomas Algorithm : 

1. Read n, {x[ i ] , i = 0 , 1 , 2 N-1} 

2. Factorize n in the form 

c c c 

n = Pi P 2 • • • Pm 


(4.40) 


- ni • n2 * .... njj, 

3. Find the values of the coefficients used in the 
expression of CRT. 

Input index : i , i = 0,1,2, ... ,N-1 

Output index : j , j= 0,1,2, ... ,N-1 
1 ” X mod • 


m 

i = Z Nic Ilk ik (4.41) 

k=l 

where Mk = N / nk and Mk’Nk “ ^ rood nk 

Find the values of Ilk algorithm described 



By DFT 


ik — > Jk 

but 


(4.46) 


Jk “ j mod k=l,2,. 

and 

m 

j = E (mod n) 

k=l 


From (4.41), define 


.m (4.47) 


j' = E Nk Mk jk 
k=l 

The element of the result sequence at the j ’ th position 
should be at the jth position. 

8a. Find Jk = j’ rood nk k=l,2,3 N-1. 

j’=0,l,2, . . . ,N-1. 

8b. Use (4.48) to find j. 

8c. Put X( j’ ) in X’ ( j) . 

4.5 Computation of Discrete Fourier Transform by Convolutions 

4.5.1 Rader Prime Lencth Alfiorithro 

The Rader prime algorithm can be used to compute the Fourier 
transform in any field F whenever the blocklength N is a prime. 
Because p is a prime the structure of GF(p) is used for re- 
indexing the components of the input sequence. 

Let IT be a primitive element in GF(p). Then each integer 
less than p can be expressed as a unique power of -n . The Fourier 
transform can be written with the input sequence index i and DFT 
coefficient index k expressed as a power of tt . Because i and k 


take the value zero and zero is not a power of tt 


the zero 



frequency component and the zero time component must be treated 
specially. Then 


P-1 

^0 = ^ Xi (4.48) 

1 = 0 


For each i from 
unique integer 
function rCi) 
{1,2, . . . .N-1} ; 


Xq = xq + E 
i = l 

1 to N-1, where N=p in this case, let rCi) be 
from 1 to N-1 such that in GF(p) , = i. 
is a map from the set { 1 , 2 , . . . .N-1 } onto the 
it is a permutation of { 1 , 2 , . , . ,N-1 } . Then 


the 

The 

set 

can 


be written 


N-1 ■^Ct),yCK} 

XQ + E Un" • 
i = l 

Because r(i) is a permutation, set l=r(k), set j=N-l-r(i), and use 
j as the index of summation to get 


N-1 e-j 

X^e= XQ + X Un^ • 

j=l 


N-2 (2-j 

X’l = XQ + E Un"" • x'j 
j=0 

where X’l = X^C and x’ j = x^"''"'^are scrambled input and output data 
sequences. This is now the equation of a cyclic convolution 

■fT^ 

between x' j and Un - Thus, by scrambling the input and output 
indices, the problem of Fourier transform can be converted into a 
problem of convolution. 

The above formulation, when the input sequence is 



real /coinpl ex , results in convolution of real /coinpl ex sequence and 
a complex sequence. It is possible to reduce the computation 
effort by exploiting the symmetry of the complex sequence composed 
of different powers of unity. The result is summarised in the 
following theorem : 

« 

X 

Theorem 4.1 : Let g(x) be a Rader polynomial, where ei = 

For every odd prime p, the coefficients of gCx)(mod -1) 

are real numbers, and the coefficients of gfxJCJnod x^^ 
are imaginary numbers. 

Using the above theorem the algorithm for DFT for prime length 
is given as follows : 

4.5. lA Rader Algorithm for prime length : Given a real 

sequence x^ , i=0 , 1 , 2 , . . . ,N-1 where N is a prime this algorithm 

will compute its DFT coefficients using the Uinograd algorithm for 
fast convolution. 

1. Find the primitive element it in GF(N). 

N-1 

2 . Set Xq = E Xi 

i=0 

3. Scramble the elements of the x vector such that the 

N-1- j 

element at position j ’ =ti is at position j for 

j=0 , 1 , 2 N-2 . 

4. Process the elements of the resultant vector such that 
the new contents are defined by the following : 

Xi = Xi + Xi + (;N_i)/2 

3Ci + (N-l)/2 = Xi - Xi + (N-l)/2 


for i=l,2, . . . ,(N-l)/2 



Generate a new sequence , i=l,2 N~1 such that 

= cos ( 0 ' IT ^ ^ ) 

^i + (N-l)/2 = sin(e-TT^~^) for 0 i i S CN-l)/2 


Partition the vectors x and h into and 

\h^^^ respectively such that 


C2D 

’^(i-l) = Xi + (N-l)/2 

i = 1.2,3 (N-l)/2 

h(i_i)^^^ = hi 

"(i-1) - lii+(N-l)/2 

i = 1,2,3 CN-l)/2 

7. Convolve x^^^ with h^^^ and x^^^ with h^^^. Let the 

result be in and X^^^. 

8. Combine the the vectors X^^^ and X^^^ to obtain the 
resultant vector X with 


Xj = X^^^i + 3-X^^^i+(N-l)/2 , 1 i i i (N-1)/2 

= X^^^i - j-X^^^i+(N-l)/2 . (N-l)/2+l S i i (N-1) 

9. Set Xi = XQ + Xi for i = 1 , 2 , 3 , . . . ,N-1 . 


4.5.2 Uinofirad Fourier Transform Algorithm for power of an odd 
prime 

The idea of Rader Algorithm can be used even for sequences 
of length N=p*^, where p is a odd prime is considered. As discussed 
in Chapter 2 that primitive roots u modulo p always exists and 



that these primitive roots are of order ^(p-1). Thus it is 
possible to convert a DFT of dimension p*^ into a cyclic 
convolution of length p*^ plus some additional terms. It 

can be shown that the DFT of size p*^ can be partitioned into two 

• C “ 1 c — 1 

DFTs of size p and one convolution of length p CP~1)- The 

procedure of breaking up can be used recursively to convert the 
c-1 

DFTs of size p into convolutions. 

One of the operation which is required for implementation is 
finding the primitive root modulo p*” for m=l,2,...,c. This 
operation makes the algorithm computationally inefficient. The 
following theorem reduces the problem to just finding the 
primitive root modulo p. 

Theorem 4.2 ; If an element is a primitive root modulo p, then it 
is also a primitive root modulo p*^ , but the converse is not true. 

Using the above theorem and the discussion before that 
algorithms that are computationally efficient can be implemented. 



CHAPTER FIVE 


CONCLUSIONS 

Various algorithms for the computation of discrete 
convolution and Fourier transform are implemented. The matrix 
based formulation of the Uinograd algorithm for the computation 
of short convolutions are implemented for sequences of length 2, 
3. 4, 5, 7 and 9. For convolution of long sequences it is 
necessary that the relatively prime factors of data sequence 
length, N, should belong to the set F = { 2 , 3 , 4 , 5 , 7 , 9 } . Also for 
convolution of multidimensional data, the size of each dimension 
should satisfy the above condition. 

The mixed radix Cooley-Tukey algorithm and the Good-Thomas 
algorithm are quite general and can be applied for any lengths. 
But for using the Rader algorithm or the Uinograd algorithm it is 
necessary that the length of resulting convolutions has its 
relatively prime factors belonging to F = { 2 , 3 , 4 , 5 , 7 , 9 } . 

It is attempted to keep the complexity of the algorithm at the 
minimum without compromising the speed and the memory requirements 
of the programs. The trignometric functions, namely sine and 
cosine, are generated and stored before running any discrete 
Fourier transform program. This results in an appreciable increase 
in the running time as they are used repeatedly in the computation 
of discrete Fourier transform. To minimize the memory requirements 
of the programs the dynamic memory allocation facility of Pascal 
is utilized. All the intermediate variables are released back to 
the heap memory after their use. 

All the algorithms have been written as procedures. This 



allows sharing of a common set of procedures for all the 
algorithms. All the related procedures are kept in a single file 
as Pascal units. The use of units permits storing the procedures 
in compiled form. 

5.1 Scope for future work 

One of the serious restriction of the convolution algorithm 
implemented is that the length of the convolution should have 
relatively prime factors belonging to the set F= { 2 , 3 , 4 , 5 , 7 , 9 } . The 
algorithm for convolution of sequences length of a power of an odd 
prime can be very useful. 

All the algorithms implemented in this thesis are intended 
for operations on real and rational field and integer ring. The 
factorization of polynomials are done over rational field. Use of 
polynomial transforms can reduce the computational complexity of 
the algorithms by factorizing the modulo polynomial over 
polynomial extension fields. The algorithms already implemented 
for Uinograd convolution based on the polynomial formulation can 
be used for further development. Also a scheme for generating the 
A, B and C matrices can be developed from the polynomial 
formulation. The resulting matrices may not be the roost optimum 
but can serve as a starting point for further reduction of 
computational complexity. 



I 


LIST OF PROCEDURES 

I.l Discrete Fourier Transform 

Unit Name : CompDef 

Constant Declarations : 

NaxSamp = 100; 

UnitSize = 64; 

NoOf Types = 6; 

NoOfFactors = 15; 

NoOfPrimes = 20; 

Type Declarations : 

Comp_no = array [1.. 2] of real; 

Comp_Vector = arrayt 0 . .MaxSamp] of Comp_no; 

Comp_Unit = array [ 0 .. UnitSize] of real; 

Descriptor = array[ 1 . .NoOfTypes , 1 . . 6 ] of integer; 
FactorArr = array [ 1 .. NoOfFactors ] of integer; 

Primes = array[ 1 . .NoOfPrimes] of integer; 

Procedures and functions : 

1. procedure TimeStart (var irword; var j:word; var k:word; 

var Tick :real); 

Input : None 
Output : i.j.k.Tick 

Records the current minute in i, current second in j and 
decimal second in k. The variable Tick is initialized to 


zero . 


2. procedure TimeStopC i , j ,k: word ; var Tick); 

Input : i , j ,k 

Output : Tick 

Calculates the time taken in seconds between the current 
time and the time when i,j and k were stored using the 
TimeStart procedure. 


Unit Name ; Dft 


Uses CompDef 

Procedures and functions : 

1. procedure Comp_riult(a,b : Comp_no; var c;Coinp_no): 

Input : a,b 

Output ; c 

Ilultiplies complex numbers a and b and stores the result in 

c . 

2. procedure CmpMlt InCvar a:Comp_no; b:Comp_no); 

Input : a,b 

Output : a 

riultiplies a and b and stores the result in a. 

3. procedure DftBrutefvar InVector : Comp_Unit ; 

n.flag; integer); 

Input : InVector, n, flag 
Output : InVector 

Computes the discrete Fourier transform of InVector of size 
n. If Flag = 1 the DFT is computed else if Flag = -1 the 


IDFT is computed. 



Unit Name ; 


PrimRoot 


Uses CompDef 

Procedures and functions : 

1. function modsqrCvar xrintefier; var n integer) : integer ; 

2 

Computes x mod n. 

2, function roodroult (var xiinteger; var y:integer; 

var n: integer) : integer ; 


Returns x*y mod n. 

3. function FastExpCvar arinteger; var z:integer; 

var n: integer) ; integer ; 

Returns a mod n. 

4. function primitiveCbase : integer ): int eger ; 

Returns the primitive element of GFCbase) . It requires 
base to be a prime number. 


Unit Name : Factor 


Uses CompDef 

Procedures and functions : 

1. procedure Factorize(n : integer ; var Factor : FactorArr ; 

var S_no : integer ; var prime zboolean) ; 

Input : n 

Output : Factor ,S_no, prime. 

Prime is true if n is a prime number. The factors of n are 
stored in factor array and S_no indicates the number of 


factors . 



Unit Name : 


Radix2 


Uses CompDef 

Procedures and functions 

1. procedure Df tRadixZ (var InVector : Comp_Vector ; 

n,m; intecer) ; 

Input : InVector, n,m 
Output : InVector 

Computes the DFT of InVector of size n = z"*. The result is 
stored back in InVector. 


Unit Name ; Thomas 


Uses CompDef , Dft , Factor .RadixZ 

Procedures and functions : 

1. procedure ComRadixCvar Buff : Comp_Unit; 

bas e, power : integer) ; 

Input ! Buf f , base , power 
Output : Buff 

pow^r 

Computes the DFT of the vector Buff of size base using 

the Cooley Tukey algorithm, where base is an odd prime. 

2. procedure ThomasDf t (var InVector : Comp_Vector ;N : integer ) ; 

Input : InVector, N 
Output : InVector 

Computes the DFT of InVector of dimension N. The procedure 
utilizes the Good—Thomas Algorithm. 




Unit Name ; 


PrimeDf t 


Uses CompDef , Dft , PrimRoot , Factor 

Procedures and functions : 

1. procedure OddPrimeC var InVector : Coinp_Unit ; 

base, power ; intecer); 

Input : base, power , InVector 
Output : InVector 

Computes the DFT of InVector of length N = (base)^®'^^*^ . 


The 


base should be a power of an odd prime. 


I.Z 


Discrete Convolution 


Unit Name : UinoDef 


Constant Declarations : 
nsize = 20; 
nlsize = 20; 
nZsize = 20; 
maxdim = 20; 

HaxFac = 4 ; 

Type Declarations 

OneDim = array[ 0 .. nsize] of real; 
TwoVec = array[0..1] of onedim; 

SmlVec = array[ 0 . . MaxDim] of real; 
TwoElmt = array[1..2] of real; 

Polynom = array[0 . .nlsize] of twoelmt; 
TwoDim = array[0. .nlsize] of polynom; 
FacArr = array[ 1 . .HaxFac] of integer; 
Fac2Arr = array[0..1] of FacArr; 


Unit Name : ImageDef 


Uses UinoDef 

Constant Delarations 
MaxLenl = 40; 
riaxLen2 = 40; 

Type Declarations 

ImageArr = array [ 0 . -MaxLenl , 0 . .MaxLenZ ] of real; 

= array[0..1] of FacZArr; 


AllFac 


Unit Name : 


Uino2pro 


Uses UinoDef 

Procedures and functions ; 

1. procedure UinoZfrdCvar d: SmlVec); 
Input : d 

Output : d 

Does the operation of pre - addition on 
vector of length 2. The resulting vector is 

2. procedure Uino2frgCvar d; SmlVec); 
Input : g 

Output : g 

Does the operation of pre - addition on 
vector of length 2. The resulting vector is 

3. procedure Uino2RevCvar s: SmlVec); 
Input : s 

Output : s 

Does the operation of post - addition on 
vector of length nC2). The resulting vector 


the elements d 
of size M(2) . 


the elements g 
of size n( 2) . 


the elements s 
is of size 2 . 


Algorithms for input sequences of size 3,4,5, 7 and 9 are 
implemented. The name of the procedures are indentical to 
the ones listed above with 2 replaced by appropraite number. 


Unit Name 


Uinosub 


WinoZpro, Uino3pro, Uino4pro, Uino5pro, 

Uino7pro 

Procedures and functions 

1. function FindMultfi : integer) : integer ; 

Returns the number of multiplications required to compute i 
- point convolution using Uinograd algorithm. 

2. procedure UinoProc(way ,NoOfPts : integer ; 
var original ; SmlVec); 

Input : way .NoOfPts , original 
Output : original 

Does the pre-addition or post addition operation on the 
elements of original vector of size NoOfPts. The operation 
to be done depends on the value of variable way. 

Uay Operation done 

1 Pre - addition corresponding to 

d vector. 

2 Pre - addition corresponding to 

g vector. 

3 Post addition 


Unit Name : Uino2dim 

Uses Uinodef. Uino2pro, Uino3pro. Wino4pro, UinoSpro, 

Uino7pro .UinoSub 


Procedures and functions : 


1, procedure AfiClyFrCvar c: onediin; var d; onedim; 

n.nn.mm : integer; var data: twodiin) ; 
Input : n,nn,mro,g,d 
Output : data 


Converts two one-dimensional vector of length n = nn x 
mm into a two - dimensional data using the Agarwal- 
Cooley algorithm. 

2. procedure AgClyRev(var s : onedim; n,nn, mm: integer; 

var data : twodim); 

Input : data,n,nn,mm 

Output ; 8 

Does the inverse operation of AgClyFr ; it converts a 

two-dimensional data into a onedimensional data. The 
dimension of the data vector is nn x mm - n. 

3. procedure processorC check: int eger ; var data : Twodim, 

nn.mm linteger; var nml : integer; var nm2 : integer) ; 
Input • nn , inin , data 

Output ! nml > nin2 , data 

Computes the Uinoersd trsnsform alon£ the rows and columns 

of data. Ths numbsr of rous and columns are nn and mm 

respectively. The order of the resultant matrix is nml x 

nm2. If check = 1 then the ’d’ transform Is done, and if 

check=2 then ’b' transform is csrri.d out. 

3. procedure reverseCvar data: Tvodim: nn . mm , nml , nm2 


: integer) ; 


Input : nn,mm,nml,nm2,data 


Output : data 



Computes the Inverse Uinograd transform along the rous and 
columns of data. The number of rows and columns are ntnl and 
nm2 respectively. The order of the resultant matrix is nn x 
mm. The operation can be viewed as the inverse of processor. 


Unit Name : Uinoform 


Uses 


Winodef , UinoZpro , Wi no 3 pro , Wino4pro , WinoSpro , 
Uino7pro,Uinosub 
Procedures and functions 

1. procedure scramblefscram ; byte; var Vector ; onedim; nn, 

nofac : integer; var Factor ; FacZArr ; 
var weight : FacArr); 

Input : scram,nn,nofac , Factor .weight , nofac .Vector 
Output : Vector 

Scrambles the elements of the input vector Vector. The 
length is n = nn+1. The relatively prime factors of n 
are given in Factor with nofac indicating the number of 
relatively prime factors of n. weight indicates the 
weights for the mixed radix representation of input 
index. If scram is 0 then scrambling is done, else if 
it is 1 then unsrambling is done. 

2. procedure transformC flag: byte ; var ProVec ; onedim; 
nofac, n, NoOfMult : integer; var Factor : Fac2Arr ; 

var weight : FacArr ) ; 

Input : f lag. ProVec, nofac, n. NoOfMult .Factor. weight 


Output : ProVec 



Computes the forward Uinograd transform or the reverse 
Uinograd transform depending on the value of flag. If 
flag is 1 then the ’d’ transform is done ; if flag=2 
then 'g' transform is done and if flag=3 then the 
reverse ’s' tramsform is done. The meaning of other 


variables remain the same. 



1.3 Folynoinial Alfiebra 


Unit Name PolyDef 


Constant Declarations 
MaxTerins = 12 
Type Declarations 

PolyType = array[ -1 . .MaxTerms ] of real; 
IntPoly = array[ -1 . .MaxTerms ] of integer; 


Unit Name : Polyint 


Uses PolyDef 

Procedures and functions 

1. procedure modulo(var a : intpoly ; var m lintpoly); 
Input : a,m 

Output : a 

Computes a mod m, where a and ro are integer polynomials. The 
result is stored back in a. 

2. procedure Multiplyfvar a: intpoly; var b: intpoly); 

Input : a,b 

Output : a 

Multiplies integer polynomials a and b. 

3. procedure Division( var dividend ; intpoly; 

var divisor : intpoly; 


var quotient : intpoly); 




Input : dividend , divisor 
Output : quoti ent , dividend 

Divides dividend by divisor to give the quotient polynomial 
and the remainder is stored in dividend after execution. 
This procedure is suitable only for cases when the results 
are known to belong to ring of integer polynomials. 

4. procedure PseuDiv( var ddpoly : intpoly ; 

var drpoly : intpoly ; 
var qupoly : intpoly ; 
var denom ; integer); 

Input : ddpoly , drpoly 
Output : qupoly , denom 

Divides integer polynomials ddpoly and drpoly. The resulting 
quotient polynomial is stored in qupoly and the remainder in 
ddpoly. Denom stores the common denominator for remainder 
and quotient as they can be rational in the general case. 


Unit Name : Polynom 


Uses PolyDef 

Procedures and functions 

1. procedure AddRealfvar arpolytype; var b;polytype); 
Input : a,b 

Output : a 

Addition of real polynomials a and b . 

2. procedure rmodCvar a:polytype; var m: intpoly); 
Input : a,m 

Output : a 


computes a mod m. 





3. procedure rmultCvar a;polytypef bjpolytype); 

Input : a,b 

Output : a 
Multiplies a and b, 

4. procedure rinultCvar arpolytype; var brintpoly); 

Input : a,b 

Output : a 

Multiplies real polynomial a with an integer polynomial 
b. 


Unit Name : gcd 


Uses PolyDef 

Procedures and functions 

1 . function TvoGCDCx,y : integer) : integer ; 

Returns the gcd of two integers x and y. 

2. function cont(var numbers : intpoly ) : int eger ; 

Returns the content of the polynomial numbers. 

3. procedure PrimPolyCvar numbers : intpoly ; var k; integer); 

Input : numbers 

Output : numbers, k 

Reduces numbers to primitive form and k returns the content 
of numbers. 

4. function leader(var numbers : intpoly) ; 

Returns the leading coefficient of the input polynomial 


numbers . 



Unit Name : 


Rational 


Uses PolyDef , Polyint , ficd 

Procedures and functions : 

1. procedure ShowOf f (n , d : int eger ; var arintpoly); 

Input : n,d,a 

Output : None 

Displays a rational polynomial a with the common numerator n 
and denominator d. 

2. procedure PolyCopyCn.d; integer ; var a:intpoly; 

var nn; integer; var dd: integer; 

• var b : intpoly) ; 

Input : n,d,a 

Output : nn,dd,b 

Copies the input variables onto the output variables. 

3. procedure Reduce(var i: integer; var j: integer; var 

k : integer) ; 

Input : i,j 
Output : i , j ,k 

Sets k = gcd(i,j). Also reduces i and j such that i=i/k 
and j=j/k. 

4. procedure AddRat(var na: integer; 

var da: integer; 
var a : intpoly ; 
nb: integer; 
db: integer ; 


var b : intpoly) ; 



Input : na , da , a , nb , db , b 
Output : na,da,a 

Adds rational polynoniials a*(na/da) and b*(nb/db). 

5. procedure SubRatCvar na: integer; 

var da: integer ; 
var a : intpoly ; 
nb : int eger ; 
db : int eger ; 
var b : intpoly) ; 

Input : na , da , a , nb , db , b 
Output : na,da,a 

Subtracts rational polynomials a*(na/da) and b*Cnb/db). 

6. procedure HultRatCvar narinteger; 

var da:integer; 
var a : intpoly; 
nb : integer ; 
db: integer; 
var b : intpoly); 

Input : na , da , a ,nb , db , b 
Output : na,da,a 

Hultiplies rational polynomials a*(na/da) and b*Cnb/db). 

7. procedure DivRatCvar na: integer; 

var da: int eger ; 

var a : intpoly; 

nb: integer; 

db : int eger ; 

var b : intpoly) ; 
var nc: integer; 

var dc: integer; 


var c ; intpoly) ; 



Input : na , da , a , nb , db , b 
Output : na , da , a, nc , dc , c 

Divides rational polynomials a*Cna/da) and b*Cnb/db) 
The remaider is stored in a and the quotient in c. 


Unit Name : InVer 


Uses FolyDef .Polyint ,ficd, rational 

Procedures and functions : 

1. procedure InVert(var vrintpoly; var urintpoly; 

var npp ; int efier ; var dpp ; int eger ; var pp; 
intpoly 3 ; 

Input : v,u 
Output : npp.dpp.pp 


Computes the inverse of v mod u. 
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